Tutorials

Tutorials on Using Futures

Henrik Bengtsson
Location: useR! 2022 conference, online, June 20-24, 2022
2022-06-20

Tutorial: An Introduction to Futureverse for Parallel Processing in R

Abstract

In this tutorial, you will learn how to use the future framework to turn sequential R code into parallel R code with minimal effort.

There are a few ways to parallelize R code. Some solutions come built-in with R (parallel package) and others are provided through R packages available on CRAN. The future framework, available on CRAN since 2015 and used by hundreds of R packages, is designed to unify and leverage common parallelization frameworks in R, to make new and existing R code faster with minimal effort by the developer.

The futureverse (https://futureverse.org) allows you, as the developer, to stay with your favorite programming style. For example, future.apply provides one-to-one alternatives to base R’s apply() and lapply() functions, furrr provides alternatives to purrr’s map() functions, and doFuture provides support for using foreach’s foreach() ...%dopar% syntax.

At the same time, the user can switch to a parallel backend of their choice – e.g., they can parallelize on their local machine, across multiple local or remote machines, towards the cloud, or on a job-scheduler on a high-performance computing (HPC) cluster. As a developer, you do not have to worry about which backend the user picks – your future-based code will remain the same regardless of the parallel backend.

PS. We will not cover asynchronous Shiny programming using futures and promises in this tutorial.

Acknowledgments: This tutorial and other work on futureverse is funded by Essential Open Source Software program ran by the Chan Zuckerberg Initiative (CZI EOSS #4).

Objectives

After completing this tutorial, my hope is that you:

and understand how the future framework:

Preparing for this tutorial

Ahead of time, before attending the tutorial, please install the following R packages:

install.packages("future")         # ~ 30 secs
install.packages("future.apply")   # ~ 15 secs
install.packages("furrr")          # ~ 60 secs
install.packages("foreach")        # ~ 10 secs
install.packages("doFuture")       # ~ 15 secs
install.packages("doRNG")          # ~ 15 secs
install.packages("plyr")           # ~ 60 secs
install.packages("future.callr")   # ~ 30 secs
install.packages("progressr")      # ~ 15 secs
install.packages("progress")       # ~ 15 secs

The time estimates are when install the package from source on a fresh Linux R setup with a 1 Gbit/s internet connection. It’s faster when installing from binaries on macOS and MS Windows.

If you already have some of these installed, please make sure to they are up-to-date before starting this tutorial, i.e.

update.packages()

If you have any issues, please reach out for help on https://github.com/HenrikBengtsson/future-tutorial-user2022/discussions/.